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Abstract 

The Matern covariance function is a popular choice for modeling dependence in spatial environmental data. 
Standard Matern covariance models are, however, often computationally infeasible for large data sets. In this 
work, recent results for Markov approximations of Gaussian Matern fields based on Hilbert space approxima- 
tions are extended using wavelet basis functions. These Markov approximations are compared with two of the 
most popular methods for efficient covariance approximations; covariance tapering and the process convolution 
method. The results show that, for a given computational cost, the Markov methods have a substantial gain in 
accuracy compared with the other methods. 
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On ■ 1 Introduction 

The traditional methods in spatial statistics were typically developed without any considerations of computa- 
tional efficiency. In many of the classical applications of spatial statistics in environmental sciences, the cost for 
obtaining measurements limited the size of the data sets to ranges where computational cost was not an issue. 
Today, however, with the increasing use of remote sensing satellites, producing many large climate data sets, 
computational efficiency is often a crucial property. 

In recent decades, several techniques for building computationally efficient models have been suggested. 
In many of these techniques, the main assumption is that a latent, zero mean Gaussian process X(s) can be 
expressed, or at least approximated, through some finite basis expansion 

n 

X(s) = J>^-(s), (1) 

where Wj are Gaussian random variables, and {£j}™ =1 are pre-defined basis functions. The justification for using 
these basis expansions is usually that they converge to the true spatial model as n tends to infinity. However, 
for a finite n, the choice of the weights and basis functions will greatly affect the approximation error and the 
computational efficiency of the model. Hence, if one wants an accurate model for a given computational cost, 
asymptotic arguments are insufficient. 

If the process X(s) has a discrete spectral density, one can obtain an approximation on the form (Q~|) by 
truncating the spectral expansion of the process. Another way to obtain an, in some sense optimal, expansion 
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on the form ([]]) is to use the the eigenfunctions of the covariance function for the latent field X(s) as a basis, 
which is usually called the Karhunen-Loeve (KL) transform. The problem with the KL transform is that analytic 
expressions for the eigenfunctions are only known in a few simple cases, which are often insufficient to represent 
the covariance structure in real data sets. Numerical approximations of the eigenfunctions can be obtained for a 
given covariance function; however, the covariance function is in most cases not known, but has to be estimated 
from data. In these cases, it is infeasible to use the KL expansion in the parameter estimation, which is often the 
most computationally demanding part of the analysis. The spectral representation has a similar problem since 
the computationally efficient methods are usually restricted to stationary models with gridded data, and are not 
applicable in more general situations. Thus, to be useful for a broad range of practical applications, the methods 
should be applicable to a wide family of stationary covariance functions, and be extendable to nonstationary 
covariance structures. 

One method that fulfills these requirements is the process convolution approach ( Barry and Ver Hoel , 19961 
IffigdonL l200ll ICressie and Ravlicoval . 120021 iRodrigues and Digglel bold) . In this method, the stochastic field, 
X(s), is defined as the convolution of a Gaussian white noise process with some convolution kernel k(s). This 
convolution is then approximated with a sum on the form CO) to get a discrete model representation. Process 
convolution approximations are computationally efficient if a small number of basis functions can be used, but 
in practice, this will often give a poor approximation of the continuous convolution model. 



A popular method for creating computationally efficient approximations is covariance tapering ((Furrer et al 



2006). This method can not be written as an approximation on the form (fl}, but the idea is instead to taper 



the true covariance to zer o beyond a cert ain range by multiplying the covariance function with some compactly 
supported taper function This facilitates the use of sparse matrix techniques that increases the 

computational efficiency, at the cost of replacing the original model with a different model, which can lead to 
problems depending on the spatial structure of the data locations. However, the method is applicable to both 
stationary and nonstationary covariance models, and instead of choosing the set of basis functions in £[]), the 
tape r range and the tap er function has to be chosen. 



Nychka et all (120021) used a wavelet basis in the expansion (Q]), and showed that by allowing for some cor- 



relation among the random variables Wj, one gets a flexible model that can be used for estimating nonstationary 
covariance structures. As a motivating example, they showed that using a wavelet basis, computationally efficient 
approximations to the popular Matern covariance functions can be obtained using only a few nonzero correlations 
for the weights Wj. The approximations were, however, obtained numerically, and no explicit representations 
we re derived. 

Rue and Tjelmeland (2002) showed that general stationary covariance models can be clo sely approx i mated 



by Markov random fields, by numerically minimizing the error in the resulting covariances. ISong et al.1 (|2008|) 
extended the method by applying different loss criteria, such as minimizing the spectral error or the Kullback- 
Leibler divergence. A drawback of the methods is that, just as for the KL and wavelet approaches, the numerical 
optimisation must in general be perform ed for each distinct parameter configuration. 

Recently, iLindgren and Rue! (12007b derived an explicit method for producing computationally efficient ap- 
proximations to the Matern covariance family. The method uses the fact that a random process on M. d with a 
Matern covariance function is a solution to a certain stochastic partial differential equation (SPDE). By consider- 
ing weak solutions to this SPDE with respect to some set of local basis functions {£?}" =1 , an approximation on 
the form (fl]) is obtained, where the stochastic weights have a sparse precision matrix (inverse covariance matrix), 
that can be written directly as a function of the parameters, without any need for costly numerical calculations. 
The m ethod is also extendab l e to more general stationar y and nonstationary models by extending the generating 
SPDE jLindgren et all lioiUlBolin and Lindgrerlboill). 



In this paper, we use methods from ILindgren and Rud (|2007) and Lindg ren et all (1201 11) to algebraically 
compute the weights Wj for wavelet based approximations to Gaussian Matern fields (Section |3). For certain 
wavelet bases, the weights form a Gaussian Markov Random Field (GMRF), which greatly increases the compu- 
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tational efficiency of the approximation. For other wavelet bases, such as the one used in [Nychka et al.1 (|2002), 
the weights can be well approximated with a GMRF. 

In order to evaluate the practical usefulness of the different approaches, a detailed analysis of the computa- 
tional aspects of the spatial prediction problem is performed (Section |2] and Section|4]). The results show that the 
GMRF methods are more efficient and accurate than both the process convolution approach and the covariance 
tapering method. 



2 Spatial prediction and computational cost 

As a motivating example for why computational efficiency is important, consider spatial prediction. The most 
widely used method for spatial prediction is commonly known as linear kriging in geostatistics. Let Y(s) be an 
observation of a latent Gaussian field, X(s), under mean zero Gaussian measurement noise, £ (s), uncorrected 
with X and with some covariance function rg(s, t), 

Y(s) = X(s) + £(s), (2) 

and let fi(s) and r(s, t) be the mean value function and covariance function for X(s) respectively. Depending on 
the assumptions on //(s), linear kriging is usually divided into simple kriging (if fi is known), ordinary kriging 
(if 11 is unknown but independent of s), and universal kriging (if \x is unknown and can be expressed as a linear 
combination of some deterministic basis functions). To limit the scope of this article, parameter estimation will 
not be considered, and to simplify the notations, we let /x(s) = 0. It should, however, be noted that all results in 
later sections regarding computatio nal eff i ciency al so hold in the cases of ordinary kr iging and universal kriging. 
For more details on kriging, see e.g lSteinl (119991) or lSchabenberger and Gotwayl (120051) . 



Let r(s, t) have some parametric structure, and let the vector 7 contain all covariance parameters. Let Y be 
a vector containing the observations, Xi be a vector containing X(s) evaluated at the measurement locations, 
si, . . . , s m , and let X2 be a vector containing X(s) at the locations, §1, . . . , s^, for which the kriging predictor 
should be calculated. With X = (X^, Xj) T , one has Xi = AiX, and X2 = A2X for two diagonal matrices 
Ai and A2, and the model can now be written as 

X| 7 ~ N(0,£jt), 
Y|X~N(A!X,Sg), 

where Y±x is the covariance matrix for X and Eg contains the covariances rg (sj , Sj ) It is straightforward to show 
that X|Y,7 ~ N(£AiEj 1 Y, £), where £ = (S^ 1 + AjE^Ai) -1 , and the well known expression for the 
kriging predictor is now given by the conditional mean 

E(X 2 |Y, 7 ) = A 2 ±A 1 'E £ 1 Y = A 2 £ x a7(A 1 £ x a7 + E^Y 

= E X2Xl (5] Xl +S £ )- 1 Y = S X2Xl S y ; 1 Y, (3) 

where the elements on row i and column j in 'Ex 2 x 1 and £y are given by the covariances r(§j, Sj) and r(sj, Sj) + 
re(si,Sj) respectively. To get the standard expression for the variance of the kriging predictor, the Woodbury 
identity is used on S: 

V(X 2 |Y, 7 ) = A 2 (S X X + AjE^A^Aj 

= A 2 £ X A 2 - A 2 E x a7(A 1 E x a7 + ^A^xAj 

= Sx 2 - HxaXi^Y^XiX!- 
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If there are no simplifying assumptions on Sj, the computational cost for calculating the kriging predictor is 
0{rhm + m 3 ), and the cost for calculating the variance is even higher. This means that with 1000 measurements, 
the number of operations needed for the kriging prediction for a single location is on the order of 10 9 . These 
computations are thus not feasible for a large data set where one might have more than 10 6 measurements. 

The methods described in Section Q] all make different approximations in order to reduce the computational 
cost for calculating the kriging predictor and its variance. These different approximations, and their impact on 
the computational cost, are described in more detail in Section HI however, to get a general idea of how the 
computational efficiency can be increased, consider the kriging predictor for a model on the form £T|). The field 
X can then be written as X = Bw ~ N (0, BS W B T ), where column i in the matrix B contains the basis function 
£i(s) evaluated at all measurement locations and all locations where the kriging prediction is to be calculated. 
Let Bi = AiB and B2 = A2B be the matrices containing the basis functions evaluated at the measurement 
locations and the kriging locations respectively. The kriging predictor is then 

E(X 2 |Y, 7 ) = B 2 (£^ + B^E^BiJ-^iS^Y. (4) 

If the measurement noise is Gaussian white noise, is diagonal and easy to invert. If S m ' is either known, or 
easy to calculate, the most expensive calculation in (@]) is to solve u = (E" 1 + B 1 r S^ 1 Bi) _1 BiI]^ 1 Y. This 
is a linear system of n equations, where n is the number of basis functions used in the approximation. Thus, the 
easiest way of reducing the computational cost is to choose n <C m, which is what is done in the convolution 
approach. Another approach is to ensure that (S^ 1 + B 1 r 5]^ 1 Bi) is a sparse matrix. Sparse matrix techniques 
can then be used to calculate the kriging predictor, and the computational cost can be reduced without reducing 
the number of basis functions in the approximation. If a wavelet basis is used, B 1 r 5]^ 1 Bi will be sparse, and in 
Section [3j it is shown that the precision matrix Q,„ = E" 1 ca n also be chosen as a sparse matrix by using the 



Hilbert space approximation technique by lLindgren et all (|201lf) . 



3 Wavelet approximations 



In the remainder of this paper, the focus is on the family of Matern covariance functions (|Matern . 1960 ) and the 



computational efficiency of some different techniques for approximating Gaussian Matern fields. Th i s sect ion 



shows how wavelet bases can be used in the Hilbert space approximation technique by Ondgren et all (|201lh to 
obtain computationally efficient Matern approximations. 

3.1 The Matern covariance family 



Beca use of its versatility, the Matern covariance family is the most popular choice for modeling spatial data (Stein, 
Il999h . There are a few different parameterizations of the Matern covariance function in the literature, and the 
one most suitable in our context is 

r(h) = - («||h||)^(«||h||), (5) 

(4^)2r(z^ + \)k 2 » 

where v is a shape parameter, k 2 a scale parameter, 4> 2 a variance parameter, and K v is a modified Bessel 
function of the second kind of order v > 0. With this parametrization, the variance of a field with this covariance 
is r(0) = (j) 2 r(u)(47r)~2T(u + |) _1 k~ 2z/ , and the associated spectral density is 

= J — 77T - (6) 



(2vr) d ( K 



+ M 
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For the special case v = 0.5, the Matern co variance function is the exponential covariance function. The smooth- 
ness of the field increases with v, and in the limit as u — > oo, the covariance function is a Gaussian covariance 
function if k is also scaled accordingly, which gives an infinitely differentiable field. 



3.2 Hilbert space approximations 



As noted by lWhittld (119631 ). a random process with the covariance ((5]) is a solution to the SPDE 

(K 2 -A)fl(s) = ^W(s), 



(7) 



where W(s) is Gaussian white noise, A is the Laplacian, and a = v + d/2. The key idea in iLindgren et al. 
d201ll) is to approximate the solution to the SPDE using a basis expansion on the form (Q]). The starting point of 
the approximation is to consider the stochastic weak formulation of the SPDE 



h, (K 2 -A)fx),i = l,...,n f) } ±{(bi, 0W),i = l,...,n 6 }. 



(8) 



Here = denotes equality in distribution, (/, g) = J f(s)g(s) ds, and equality should hold for every finite set of 
test functions i = 1, . . . , n^} from some appropriate space. A finite element approximation of the solution 
X is then obtained by representing it as a finite basis expansion on the form (fl]), where the stochastic weights are 
calculated by requiring ([8]) to hold for only a specific set of test functions |fr t ;, i = 1, . . . , n| an d {^} is a set of 
predetermined basis functions. We illustrate the more general results from Lindgren et al. ( 201 lb with the special 
case a = 2, where one uses bi = £j and one then has 



(6, (k 2 - A)X) =J2 w i <&> (** ~ A )^> 



(9) 



By introducing the matrix K with elements Ky = (k 2 — A and the vector w 
left hand side of (HJ can be written as Kw. Since, by Lemma 1 in ILindgren et all (|201 It) 



(wi, . . . ,w n ) T , the 



(k 2 - A)^> = k 2 Ci) - A^) = k 2 Ci) + VCj 



the matrix K can be written as the sum K = k 2 C + G where Cjj = and Gij = (V^j, V^). The 

right hand side of © can be shown to be Gaussian with mean zero and covariance <ft 2 C and thus get that w ~ 
N(0> 2 K- 1 CK- 1 ). 

For the second fundamental case, a = 1. ILindgren et all (120111) show that w ~ N(0, 2 K _1 ) and for higher 
order a G N, the weak solution is obtained recursively using these two fundamental cases. For example, if a = 4 
the solution to (k 2 — A) 2 Xo(s) = (f>W(s) is obtained by solving (k 2 — A)Xq(s) = X(s), where X is the 
solution for the case a = 2. This results in a precision matrix for the weights Q Q defined recursively as 



Q Q = KC ^CL _ 2 C- X K, 



o 



3,4,... 



(10) 



where Q x = 0~ 2 K and Q 2 = 0~ 2 K 1 C" ^. Thus, all Matern fields with v + d/2 £ N c an be approximated 
throu gh this procedure. For mo re details, see ILindgren and Rue! (|2007) and ILindgren et all (1201 lh . The results 
from Rue and Tjelmelandl (|2002h show that accurate Markov approximations exist also for other ^-values, and 



one approxima t e appr oach to finding explicit expressions for such models was given in the autho rs' response i n 
Lindgren et al.l (|201 lh . However, in many practical applications v cannot be estimated reliably (|ZhangL |2004), 
and using only a discrete set of ^-values is not necessarily a significant restriction. 
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3.3 Wavelet basis functions 



In the previous section, nothing was said about how the the basis functions should be chosen. The following 
sections, however, shows that wavelet bases have many desirable properties which makes them suitable to use 
in the Hilbert space approximations on M. d . In this section, a brief introduction to multiresolution analysis and 
wavelets is given. 

A multiresolution analysis on R is a sequence of closed approximation subspaces {Vj}j^z of functions in 
L 2 (R) such that Vj C V j+1 , cl\J j€Z Vj = L 2 (R), and f| jeZ Vj = {0}, where cl is the closure, and f(s) £ Vj if 
and only if f(2~ 3 s) G Vq. This last requirement is the multiresolution requirement because this implies that all 
the approximation spaces Vj are scaled versions of the space Vq. A multiresolution analysis is generated starting 
with a function usually called a father function or a scaling function. The function 99 G I? (R) is called a scaling 
function for {Vj}j & z if it satisfies the two-scale relation 

V^) = ^2pk<p{2s - k), (11) 

for some square-summable sequence {pk}k&z and the translates {ip(s — k)}k& form an orthonormal basis for 
Vq. Given the multiresolution analysis {Vj}j e z, the wavelet spaces {Wj}j^z are then defined as the orthogonal 
complements of Vj in Vj+i for each j, and one can show that Wj is the span of {^(2 J s — k)}kez, where the 
wavelet ip is defined as ip(s) = YlkeZ,(~ ^Pi-kfi^s — k). 

Given the spaces Wj, Vj can be decomposed as the direct sum 

Vj=V Q ®W ®W 1 ®...®Wj-i. (12) 

Several choices of scaling functions have been presented in the literature. Among the mos t widely used con- 



structions are the B-spline wavelets (IChui and WangL 1 19921) and the Daubechies wavelets (IDaubechiesl . 1 19921) 



that both have several desirable properties for our purposes. 

The scaling function of B-spline wavelets are m:th order B-splines with knots at the integers. Because of 
this, there exists closed form expressions for the corresponding wavelets, and the wavelets have compact support 
since the m:th order scaling function has support on (0, m + 1). The wavelets are orthogonal at different scales, 
but translates at the same scale are not orthogonal. This property is usually referred to as semi-orthogonality. 

The Daubechies wavelets form a hierarchy of compactly supported orthogonal wavelets that are constructed 
to have the highest number of vanishing moments for a given support width. This generates a family of wavelets 
with an increasing degree of smoothness. Except for the first Daubechies wavelet, there are no closed form 
expressions for these wavelets; however, for practical purposes, this is not a problem because the exact va lues for 



the wavelets at dyadic points can be obtained very fast using the Cascade algorithm (Bur rus et al.L 1 1988h . In this 
work, the DB3 wavelet is used because it is the first wavelet in the family that has one continuous derivative. The 
DB3 wavelet and its scaling function are shown in Figured] 

3.4 Explicit wavelet Hilbert space approximations 

To use the Hilbert space approximation for a given basis, the precision matrix for the weights Q a has to be 
calculated. By (TTOt . we only have to be able to calculate the matrices C and G to built the precision matrix for 
any a G N. The elements in these matrices are inner products between the basis functions: 

C fli = y 6(8)6(8) ds, G id =y'(V&(s)) T V£ i (s)ds. (13) 

This section shows how these elements can be calculated for the DB3 wavelets and the B-spline wavelets. When 
using a wavelet basis in practice, one always have to choose a finest scale, J, to work with. Given that the 
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Scaling function Wavelet 




Figure 1: The DB3 scaling function and wavelet. 

subspace Vj is used as an approximation of L 2 (M), one can use two different bases. Either one works with the 
direct basis for Vj, that consists of scaled and translated versions of the father function ip(s), or one can use the 
multiresolution decomposition (fl~2l) . In what follows, both these cases are considered. 



3.4.1 Daubechies wavelets on R 

For the Daubechies wavelets, the matrix C is the identity matrix since these wavelets form an orthonormal basis 
for L 2 (M). Thus, only the matrix G has to be calculated. If the direct basis for Vj is used, G contains inner 
products on the form 

(Vy?(2 J s - k), Vy?(2 J s - I)) = 2 J (V(p(s), Vy?(s -l + k)) = 2 J A(k - I). (14) 

Because the scaling function has compact support on [0,2N — 1], these inner products are only non-zero if 
k — I G [— (2N — 2), 2N — 2]. Thus, the matrix G is sparse, which implies that the weights w in (Q]) form 
a GMRF Since there are no closed form expressions for the Daubechies wavelets, there is no hope in finding 
a closed form expression for the non-zero inner products (fl"4l . Furthermore, standard numerical quadrature for 
calculating the inner products is too inaccurate due to the highly oscillating nature of the gradients. However, 
utilizing properties of the wavelets, one can calculate an approximation of the inner product of arbitrary precision 
by solving a system of linear equations. It is outside the scope of this paper to present the full method, but the 
basic principle is to construct a system of linear equations by using the scaling- and mo ment equations fo r the 



wavelets. This system is then solved using, for example, LU factorization. For details, see lLatto et al.l (119911) 



Using this technique for the DB3 wavelets, the following nonzero values for A(i]) are obtained 

A(0) = 5.267, A(±l) = -3.390, A(±2) = 0.876, 

A(±3) = -0.114, A(±4) = -0.00535. 

These values are calculated once and tabulated for constructing the G matrix, which is a band matrix with 2 J A(0) 
on the main diagonal, 2 J A(1) on the first off diagonals, et cetera. 

If the multiresolution decomposition (fT2l) is used as a basis for Vj, one also needs the inner products 

{V^s-k),^^^-!)) , i,jeZ. 

Because of the two-scale relation (TTTb . every wavelet tp(2^ s — k) can be written as a finite sum of the scaling 
function at scale J. Using this property, the G matrix can be constructed efficiently using only the already 
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Multiresolution basis Direct basis 




200 400 600 800 200 400 600 800 

nz = 55798 nz = 8368 



Figure 2: The non-zero elements in the G matrices for a multiresolution DB3 basis with five layer of wavelets 
and the corresponding direct basis. 6.4% of the elements are non-zero for the multiresolution basis whereas only 
0.96% of the elements are non-zero for the direct basis. 



computed A values. Figure [2] shows the structure of the G matrices for a multiresolution DB3 basis with five 
layers of wavelets and the corresponding direct basis. Note that there are fewer non-zero elements in the precision 
matrix for the direct basis. Hence, it is more computationally efficient to use the direct basis instead of the 
multiresolution basis. 



3.4.2 B-spline wavelets on ' 



For the B-spline wavelets, the matrices C and G can be calculated directly from the closed form expressions for 
the basis functions and their derivatives. When a direct basis is used on M, C is a band matrix with bandwidth 
m + 1, if the m:th order spline wavelet is used. For example, for m = 1, calculating ( [TBI gives 



2/3, i = j, 
1/6, |i-j| = l, 
otherwise, 



G 



(2. 

otherwise. 



i = J, 



Since the expression for the precision matrix for the weights w contains the inverse of C, it is a dense matrix. 
Hence, C -1 h a s to b e approximated with a sparse matrix if Q should be sparse. This issue is addressed in 
Lindgren et al. ( 201 lb by lowering the integration order of which results in an approximate, diagonal 

C matrix, C, with diagonal elements Cu = Ylk=l ^ik- In Section |4j the effect of this approximation on the 
covariance approximation for the basis expansion is studied in some detail. For the multiresolution basis, the 
matrices are block diagonal, and this approximation is not applicable. 



3.4.3 Wavelets on M. d 

The easiest way of constructing a wavelet basis for L 2 (M d ) is to use the tensor product functions generated by 
d one-dimensional wavelet bases. Let ip be the scaling function for a multiresolution on R, the father function 
can be written as <p(x\, . . . , Xd) = Y\i=i <f( x i)- The scalar product (Vc^(x), V(/j(x + rj)), where r\ now is a 
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multi-integer shift in d dimensions, can then be written as 

Id d \ 

(V£(x), Vc^x + r/)) = Vjl^i), + 

\ i=l i=l I 

i=l j^i 
d „ 

= Mvi) n / < p( x j) i p( x j + dx J- 

77 77 

This expression looks rather complicated, but it implies a very simple Kronecker structure for G^, the G matrix 
in R d . For example, in R 2 and R 3 , 

G 2 = Gi ® Ci + Ci ® Gi 

G 3 = Gi ® Ci <g> Ci + Ci <g> Gi ® Ci + Ci ® Ci <8i Gi, 

where Gi and Ci are the G and C matrices for the corresponding one-dimensional basis and (g> denotes the 
Kronecker product. Similarly, C 2 = Ci <S> C 1; and C3 = Ci ® Ci (g) Ci. These expressions hold both if the 
direct basis for Vj if used or if the multiresolution construction (fl2l ) is used for the one-dimensional spaces. For 
Daubechies wavelets, the C matrix is the identity matrix for all d > 1. This also holds for the direct B -spline 
basis if the diagonal approximation is used for Ci. 

4 Comparison 

As discussed in Section |2] is computational efficiency often an important aspect in practical applications. How- 
ever, the computation time for obtaining for example an approximate kriging prediction is in itself not that inter- 
esting unless one also knows how accurate it is. We will therefore in this section compare the wavelet Markov 
approximations with two other popular methods, covariance tapering and process convolutions, with respect to 
their accuracy and computationally efficiency when used for kriging. 

Before the comparison, we give a brief introduction to the process convolution method and the covariance 
tapering method and discuss the methods' computational properties. As mentioned in Section |2j the computa- 
tional cost for the kriging prediction for a single location based on m observations is 0(m 3 ). In what follows, the 
corresponding computational costs for the three different approximation methods are presented. We start with the 
wavelet Markov approximations and then look at the process convolutions and the covariance tapering method. 
After this, an initial comparison of the different wavelet approximations is performed in Section [44] and then the 
full kriging comparison is presented in Section I431I4.6I 

4.1 Wavelet approximations 

When using a wavelet basis, one can either work with the direct basis for the approximation space Vj or do 
the wavelet decomposition into the direct sum of J — 1 wavelet spaces and Vq. If one only is interested in the 
approximation error, the decomposition into wavelet spaces is not necessary and it is more efficient to work in 
the direct basis for Vj since this will result in a precision matrix with fewer nonzero elements. Therefore we only 
use the direct bases for Vj in the comparisons in this section. 

The wavelet approximations are on the form (Q~|), so Equation (@]) is used to calculate the kriging predictor. 
However, since an explicit expression for the precision matrix for the weights w exists for this method, we rewrite 
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the equation as 



E(X 2 |Y, 7 ) = B 2 (Q W + B7Q £ B 1 )- 1 B 1 Q^Y, 

where = 1 is diagonal if £ is Gaussian white noise. If the number of kriging locations is small, the 
computationally demanding step is again to solve a system on the form 

u = (q w + b7q £ b 1 )- 1 v. 

Now, if the Daubechies wavelets or the Markov approximated spline wavelets are used, both Q w and B^Q^Bi 
are sparse and positive definite matrices. The system is therefore most efficiently solved using Cholesky factoriz- 
ation, forward substitution, and back substitution. The forward substitution and back substitution are much faster 
than calculating the Cholesky triangle L, so the computational complexity for the kriging predictor is determined 
by the calculation of L. Because of the sparsity structure, this computationa l cost is in general O (n), C(n 3 / 2 ), 
and 0(n 2 ) for problems in one, two, and three dimensions respectively (see Rue and Held . 2005b . If the spline 



bases are used without the markov approximation, the computational cost instead is 0(n ) since Q w then is 
dense. It should be noted here that any basis could be used in the SPDE approximation, but in order to get good 
computational properties we need both Q w and B^Q^Bi to be sparse. This is the reason for why for example 
Fourier bases are not appropriate to use in the SPDE formulation since B\ in this case always is a dense matrix. 

4.2 Process convolutions 

In the process convolution method, the Gaussian random field X(s) on M. d is specified as a process convolution 

X(s) = J jfe(s,u)B(du), (15) 

where k is some deterministic kernel function and B is a Brownian sheet. One of the advantages with this 
constr uction is that nonstationary fields easily are constructed by allowing the convolution kernel to be dependent 
on location. If, however, the process is stationary we have k(s, u) = k(s — u) and the covariance function for X 
is r(r) = J k(u — r)fe(u) du. Thus, the covariance function and the kernel k are related through 

k=T- i (-^vm)=?- if 1 



. d V V 7 I I . . d 

(2tt)2 J \{2ir)2 



where S is the spectral density for X(s) and T denotes the Fourier transform (IHigdonl 120011) . Since the spectral 



density for a Matern covariance function in dimension d with parameters v, 4> 2 , and k is given by ©, one finds 
that the corresponding kernel is a Matern covariance function with parameters = | — |, ^| = <fi, and k& = k. 
An approximation of (fT5l) which is commonly used in convolution based modeling is 

n 

J'=l 

where m, . . . , u n are some fixed locations in the domain, and Wj are independent zero mean Gaussian variables 
with variances equal to the area associated with each uj. Thus, this approximation is on the form £[]), with basis 
functions Cj( s ) = — u j)- When this approximation is used, Equation (01) is used to calculate the kriging 
predictor. Because the basis functions in the expansion are Matern covariance functions, the matrices Bi and B 2 
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are dense. Thus, even though both and E^, 1 are diagonal matrices, one still has to solve a system on the form 

u=(^ 1 +B[S f 1 B 1 )- 1 v 

where (E^+BjE^Bi) is a dense n by n matrix. The computational cost for both constructing and inverting 
the matrix is 0(mn 2 + n 3 ), where n is the number of basis functions used in the basis expansion. For kriging 
prediction of rh locations, the total computational complexity is 0{rhn + mn 2 + n 3 ). 



4.3 Covariance tapering 

Covariance tapering is not a method for constructing covariance models, but a method for approximating a given 
covariance model to increase the computational efficiency. The idea is simply to to taper the true covariance, 
r(r), to zero beyond a certain range, 9, by multiplying the covariance function with some compactly supported 
positive definite taper function rg{r). Using the tapered covariance, 

r t a P {r) = re(r)r(T), 

the matrix Sy in the expression for the kriging predictor (f3]> is sparse, which facilitates the use of sparse matrix 
techniques that increases the computational efficiency. The taper function should, of course, also be chosen 
such that the basic shape of the true covariance function is preserved, and of especial importance for asymptotic 
cons iderations i s that t he smoothness at the origin is preserved. 



Furrer et al.1 (|2006) studied the accuracy and numerical efficiency of tapered Matern covariance functions, 



and to be able to compare their results to Matern approximations obtained by the wavelet Hilbert space approx- 
imations and the process convolution method, we use their choice of taper functions: 



Wendlandi : 



\Vendland2 : 



r e( T ) = ^max 



re(r) 



( 



max 



1 



1 



1 + 4^ 



-,o 



1 + 6^ 



+ 



35||t|| 



These taper functions were first introduced by IWendlandl (I1995D . For dimension d < 3, the Wendlandi function 
is a valid taper function fo r the Matern covari ance function if v < 1.5, and the Wendland2 functions is a valid 
taper function if v < 2.5. IFurrer et al.l (2006) found that Wendlandi was slightly better than Wendland2 for a 
given v, so we use Wendlandi for all cases when v < 1.5 and Wendland2 if 1.5 < u < 2.5. 
If a tapered Matern covariance is used, the kriging predictor can be written as 



E(X 2 |Y, 7 ) 



-I-- 



where the element on row i and column j in Ej£ P j£ and are given by r tap (si, Sj) and r tap (si, Sj) respect- 
ively. Since the tapered covariance is zero for lags larger than the taper range, 9, many of the elements in 
will be zero. Thus, the three step approach used for the wavelet Markov approximations can be used to solve the 
system u = (S^ + 'E £ )- 1 Y efficiently. Since the number of non-zero elements for row i in is determined 
by the number of measurement locations at a distance smaller than 9 from location s, , the computational cost is 
determined both by the taper range and the spacing of the observations. Thus, if the measurements are irregularly 
spaced, it is hard to get a precise estimate of the computational cost. However, for given measurement locations, 
the taper range can be chosen such that the average number of neighbors to the measurement locations is some 
fixed number kg. The cost for the Cholesky factorization is then similar to the cost for a GMRF with m nodes 
and a neighborhood size kg. 
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4.4 Covariance approximation 



For practical applications of any of the approximation methods discussed here, one is often mostly interested in 
producing kriging predictions which are close to the optimal predictions. The error one makes in the kriging 
prediction is closely related to the methods ability to reproduce the true Matern covariance function. There are 
many different wavelet bases one could consider using in the Markov approximation method, and before we 
consider the kriging problem we will in this section compare some of these bases with respect to their ability to 
reproduce the Matern covariance function so that we can choose only a few of the best methods to compare in 
the next section. As a reference, we also include the process convolution approximation in this comparison. 

A natural measure of the error in the covariance approximation is a standardized L 2 norm of the difference 
between the true-, and approximate covariance functions, 



Note here that the true covariance function r(s, u) is stationary and isotropic, while the approximate covariance 
function r(s, u), for the basis expansion CD, generally is not. For the wavelet approximations and the process 
convolutions, e r is periodic in s since the approximation error in general is smaller where the basis functions are 
centered, and we therefore use the mean value of e(s) over the studied region as a measure of the covariance 
error. 

We use the different methods to approximate the covariance function for a Matern field on the square [0, 10] x 
[0, 10] in M 2 . The computational complexity for the kriging predictions depend on the number of basis functions, 
n, used in the approximations. For the Markov approximated spline bases and the Daubechies 3 basis, this 
complexity is 0(n 3 / 2 ) whereas it is 0(n 3 ) for the spline bases if the Markov approximation is not used and 
for the process convolution method. We therefore use 100 2 basis functions for the 0(ra 3 / 2 ) methods and 100 
basis functions for the other methods to get the covariance error for the methods when the computational cost is 
approximately equal. 



Figure[3]shows the covariance error for the different methods as functions of the approximate range, k 1 v / 8p', 

of the true covariance function for three different values of v. There are several things to note in this figure: 

1. The covariance error decreases for all methods as the range of the true covariance function increases. This 
is not surprising since the error will be small if the distance between the basis functions (which is kept 
fixed) is small compared to the true range. 

2. The solid lines correspond to Markov approximations, which have computational complexity 0(n 3 / 2 ) 
for calculating the kriging predictor, and the approximations with computational complexity 0(n 3 ) have 
dashed lines in the figure. 

3. There is no convolution kernel estimate for v = 1 since the convolution kernel has a singularity in zero in 
this case. For the other cases, the locations {uj} for the kernel basis functions were placed on a regular 
10x10 lattice in the region. 

4. The error one makes by the Markov approximation of the spline bases becomes larger for increasing order 
of the splines. Note that the third order spline basis is best without the approximation whereas the first 
order spline basis is best if the Markov approximation is used. 

It is clear from the figure that the Markov approximations have a much lower covariance error for the same 
computational complexity. Among these, the Daubechies 3 basis is best for large ranges whereas the Markov 
approximated first order spline basis is best for short ranges. The higher order spline bases have larger covariance 
errors so we from now on focus on the first order spline basis and the Daubechies 3 basis. 



/(r(s,u) - r(s,u)) 
J r(s, u) 2 du 



2 du 



(16) 
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Figure 3: Numeric approximations of the L 2 -norm ([TBI shown as a function of approximate range for different 
values of v and different bases in M 2 . In this figure, 100 2 basis functions are used for the bases with Markov struc- 
ture (solid lines), and 100 basis functions are used for the other bases (dashed lines). This gives approximately 
the same computational complexity for kriging prediction. 



4.5 Spatial prediction 

In the previous section, several bases were compared with respect to their ability to approximate the true cov- 
ariance function when used in an approximation on the form (GQ) of a Gaussian Matern field. The comparison 
showed that the Daubechies 3 (DB3) basis and the Markov approximated linear spline (SI) basis are most ac- 
curate for a given computational complexity. In this section, the spatial prediction errors for these two wavelet 
Markov approximations are compared with the process convolution method and the covariance tapering method. 
In the comparisons, n ote that the S 1 basis is essentially of the same type of piecewise linear basis as used in 



Lindgren et all (|201 lh . so that the results here also apply to that paper. 



Simulation setup 

Let X(s) be a Matern field with shape parameter v (chosen later as 1, 2, or 3) and approximate correlation range 
r (later varied between 0.1 and 4). The range r determines k through the relation k = ^/&ur~ l and the variance 
parameter 4> = 47iT(z/ + X)n v T{v) is chosen such that the variance of X(s) is 1. We measure X at 5000 
measurement locations chosen at random from a uniform distribution on the square [0, 5] x [0, 5] in M 2 using 
the measurement equation (0, where £ (s) is Gaussian white noise uncorrected with X with standard deviation 
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a = 0.01. 

Given these measurements, spatial prediction of X to all locations on a 70 x 70 lattice of equally spaced 
points in the square is performed using the optimal kriging predictor, the wavelet Markov approximations, the 
process convolution method, and the covariance tapering method. For each approximate method, the sum of 
squared differences between the optimal kriging prediction and the approximate method's kriging prediction is 
used as a measure of kriging error. 

We compare the methods for v = 1, 2, 3, and for each v we test 40 different ranges varied between 0.1 and 
4 in steps of 0.1. For a given v and a given range, 20 data sets are simulated and the average kriging error is 
calculated for each method based on these data sets. 

Choosing the number of basis functions 

To obtain a fair comparison between the different methods, the number of basis functions for each method should 
be chosen such that the computation time for the kriging prediction is equal for the different methods. The 
computations needed for calculating the prediction can be divided into three main steps as follows 

Step 1. Build all matrices except M in step 3 necessary to calculate the kriging predictor. 

Step 2. Solve the matrix inverse problem for the given method: 

SI, DB3 and Process convolution: u = (S^ 1 + B 1 r E^ 1 Bi) _1 BiSg 1 Y, 

Tapering: u = + £ £ ) ~* Y, 

Optimal kriging: u = (E Xl + X^) -1 Y. 

Step 3. Depending on which method that is used, build M = B2, M = H x ^ x , or M = Y>x 2 x± and calculate 
the kriging predictor X = Mu. 

For the optimal kriging predictor, and in some cases for the Tapering method, the matrix M cannot be calculated 
and stored at once due to memory constraints if the number of measurements is large. Each element in X is then 
constructed separately as Xj = MjU, where Mj is a row in M. It is then natural to include the time it takes to 
build the rows in M in the time it takes to calculate X, which is the reason for including the time it takes to build 
M in step 3 instead of step 1. 

The computation time for the first step will be very dependent on the actual implementation, and we will 
therefore focus on the computation time for the last two steps when choosing the number of basis functions. If 
one only does kriging prediction to a few locations, the second step will dominate the computation time whereas 
the third step can dominate if kriging is done to many locations. To get results that are easier to interpret, we 
choose the number of basis functions such that the time for the matrix inverse problem in step 2 is similar for the 
different methods. 

Now since the computational complexity for step 2 is 0(n 3 ) for the convolution method and 0(n 3 / 2 ) for 
the Markov methods, one would think that if n basis functions are used in the convolution method and n 2 basis 
functions are used for the Markov methods, the computation time would be equal. Unfortunately it is not that 
simple. If two different methods have computational complexity 0(n 3 ), this means that the computation time 
scales as n 3 when n is increased for both methods; however, the actual computation time for a. fixed n can be quite 
different for the two methods. For example, DB3 is approximately 6 times more computationally demanding than 
S 1 for the same number of basis functions. The reason being that the DB3 basis functions have larger support than 
the SI basis functions and this cases the matrices Bi and S^ 1 for DB3 to contain approximately 6 times as many 

nonzero elements compared to S 1 for the same number of basis functions. However, the relative computation 

3/2 

time will scale as n-J if n\ is increased for both methods. 
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Figure 4: Kriging errors for the different methods as a functions of the true covariance function's range. For each 
range, the values are calculated as the mean of 20 simulations. The lower limit of the bands around the curves 
are the estimate minus the standard deviation of the samples, and the upper limit is the estimate plus the standard 
deviation. 



To get approximately the same computation time for step 2 for the different approximation methods, the 
number of basis functions for SI is fixed to 100 2 . Since DB3 is approximately six times more computa t ionally 
demanding, the number of basis functions for this method is set to 1600. As mentioned in lLindgren et al.l (|201lh . 
one should extend the area somewhat to avoid boundary effects from the SPDE formulation used in the Markov 
methods. We therefore expand the area with two times the range in each direction which results in a slightly 
higher number of basis functions used in the computations. 

The computation time for SI and DB3 increases if u is increased since the precision matrix for the weights 
contain more nonzero elements for larger values of v. Therefore we use 625 basis functions placed on a regular 
25 x 25 lattice in the kriging area for the convolution method when v = 2 and use 841 basis functions placed 
on a regular 29 x 29 lattice when v = 3. For the tapering method we chose the tapering range 9 such that the 
expected number of measurements within a circle with radius 9 to each kriging location is similar to the number 
of neighbors to the weights in the SI method. For v = 1, v = 2, and v = 3 this gives a tapering ranges of 
0.4, 0.55, and 0.7 respectively and results in approximately the same number of nonzero elements in the tapered 
covariance matrix as in the precision matrix Q for the S 1 basis. 
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Optimal prediction SI basis Convolution basis Tapered covariance 




Figure 5: An example of an optimal kriging prediction and predictions using the SI basis, the convolution 
basis, and a tapered covariance when v = 2 and the covariance range is 1. The predictions are based on 5000 
observations and are calculated for a 200 x 200 grid in the square [0, 5] x [0, 5]. The number of basis functions 
and the tapering range are chosen such that the total time for Step 2 and Step 3 is approximately equal for the 
different methods. 

Results 

In Figure |4] can the average kriging errors for the different methods be seen as functions of the true covariance 
function's approximate range r. The values for a given u and r is an average of 20 simulations. The convolution 
kernels are singular if v = 1, so there is no convolution estimate for this case. The tapering estimate is best for 
short ranges, which is not surprising since the covariance matrix for the measurements not is changed much by 
the tapering if the true range then is shorter than the tapering range. For larger ranges, however, the tapering 
method has a larger error than the other methods. One reason for this is that the tapered covariance function is 
very different from the true covariance function if the true range is much larger than the tapering range. Another 
reason is that the prediction for all locations that do not have any measurements closer than the tapering range 
is zero in the tapering method. The convolution method has a similar problem if the effective range of the basis 
functions is smaller than the distance between the basis functions. In this case, the estimates for all locations that 
are not close to the center of some basis function have a large bias towards zero. These problems can clearly be 
seen in Figure [5j where the optimal kriging prediction, and the predictions for S 1 , the tapering method, and the 
convolution method, are shown for an example where u = 2 and the range is 1. 

The computation times for the different methods are shown in Table [TJ These computation times are obtained 
using an implementation in MatlatQ on a computer with a 3.33GHz Intel Xeon X5680 processor. As intended, the 
time for step 2 is similar for the different methods whereas there is a larger difference between the computation 
time for step 3 because the computation time for the kriging prediction scales differently with the number of 
kriging locations for the different methods. Note that the wavelet methods are less computationally demanding 
than the tapering method and the convolution method when doing kriging to many locations. The reason being 
that the matrix M in step 3 can be constructed without having to do costly covariance function evaluations. 

As mentioned previously is the computation time for step 1 very dependent on the actual implementation. 
However, as for step 3 can the Markov method's matrices be constructed without doing any covariance function 
evaluations which is the reason for the faster computation time. One thing to note here is that if the parameters 
are changed (for example when doing parameter estimation), one does not have to construct all matrices again in 
the Markov methods as one have to do for the other two methods. 

In conclusion we see that S 1 is both faster and has a smaller kriging error for all ranges when compared to 
DB3 and the convolution method and compared to the tapering method it has a smaller kriging error for all but 
very short ranges. Since the tapering methods computational cost varies with the tapering range, we conclude this 

'implementation available at |http : / /www . maths ■ 1th . se/mat stat/ staff/bo Iin/| 
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V = 1 





Optimal 


DB3 


SI 


Conv. 


Taper 


Step 1 
Step 2 
Step 3 


37.68 (6.357) 
5.074 (0.277) 
36.48 (6.231) 


0.490 (0.049) 
0.113 (0.014) 
0.293 (0.026) 


0.423 (0.027) 
0.088 (0.007) 
0.248 (0.018) 




2.771 (0.191) 
0.117 (0.010) 
2.051 (0.127) 


Total 


79.23 (8.906) 


0.896 (0.057) 


0.759 (0.033) 




4.939 (0.229) 


v = 2 


Step 1 
Step 2 
Step 3 


36.19 (6.965) 
5.327 (0.529) 
34.94 (6.695) 


0.600 (0.090) 
0.228 (0.039) 
0.310 (0.049) 


0.489 (0.055) 
0.203 (0.025) 
0.260 (0.036) 


0.961 (0.027) 
0.217 (0.019) 
0.942 (0.027) 


4.184 (1.523) 
0.247 (0.028) 
3.319 (0.251) 


Total 


76.45 (9.675) 


1.138 (0.110) 


0.951 (0.070) 


2.120 (0.043) 


7.750 (1.543) 


v = 3 


Step 1 
Step 2 
Step 3 


42.75 (6.572) 
5.468 (0.380) 
41.36 (6.440) 


0.759 (0.091) 
0.394 (0.051) 
0.315 (0.033) 


0.569 (0.042) 
0.377 (0.035) 
0.266 (0.025) 


5.656 (1.094) 
0.390 (0.024) 
5.522 (1.078) 


6.413 (1.051) 
0.421 (0.035) 
5.460 (0.402) 


Total 


89.58 (9.210) 


1.468 (0.110) 


1.213 (0.060) 


11.57 (1.537) 


12.30 (1.126) 



Table 1 : Average computation times for the results in Figure 01 The values are based on the 800 simulations for 
each value of v. The standard deviations are shown in the parentheses. 

section with a study of how changing the tapering range changes the results in order to get a better understanding 
of which method is to prefer when comparing S 1 and the tapering method. 

4.6 A study of varying the tapering range 

As shown above is the SI method to prefer over the DB3 method and the convolution method in all our test cases 
whereas the tapering method had a smaller kriging error for very short ranges. Since this was done using a fixed 
tapering range chosen such that the computation time for step 2 would be similar to the other methods we now 
look at what happens if the tapering range is varied when keeping the true range fixed. 

The setup is the same as in the previous comparison, a Matern field with v = 2, variance 1 and an approximate 
range r is measured at 5000 randomly chosen locations in a square in M 2 . The difference is that we now keep 
these parameters fixed but instead vary the tapering range from 0.05 to 2 in steps of 0.05. We generate 100 data 
sets and calculate the kriging predictions for the S 1 method and the tapering method for all values of the tapering 
range. Based on these 100 estimates, the average kriging error is calculated for S 1 and for each tapering estimate. 

The results can be seen in Figure [6] The kriging errors are shown in the left panels and the computation times 
are shown in the right panels. The blue lines represent the S 1 method, which obviously does not depend on the 
tapering range, and the yellow lines represent the tapering method. In the left panels, the solid lines show the 
time for step 2 in the computations and the dashed lines show the total time for step 2 and step 3. In the upper 
two panels, the true range r is 1, and 100 2 SI basis functions are used. In this case, SI is more accurate than 
the tapering method for all tapering ranges tested, which is not surprising considering the previous results. In 
the bottom panels of the figure, the true range r is 0.25 and 100 2 SI basis functions are used. This is a case 
where the tapering method was more accurate than SI in the previous study and we see here that the tapering 
method is more accurate for tapering ranges larger than 0.4 and that the time for step 2 is smaller for all tapering 
ranges smaller than 0.46. Thus, by choosing the tapering range between 0.4 and 0.46, the tapering method is 
more accurate and has a smaller computation time for step 2. 

The accuracy of the tapering method increases if the ratio between the tapering range and the true range is 
increased, and the computation time depends on what the distance between the measurements is compared to 
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Kriging error 




Tapering range Tapering range 



Figure 6: The computation time for step 2 (right) and the kriging errors (left) for the co variance tapering method 
(yellow lines) as a function of taper range. The values for the S 1 basis (blue lines) is shown for comparison. 
In the upper panels, the range of the true covariance function is 1 and in the lower panels the range is 0.25. 
The colored lines are averages of 100 simulations, and the grey bands indicates the standard deviation of these 
samples. The solid lines in the right panels show the computation time for step 2 whereas the dashed lines show 
the total computation time for step 2 and step 3 when calculating the kriging predictions using the two methods. 



the tapering range. If the distance between the measurements is large, the tapering method is fast, whereas it is 
slower if the distance is small. Thus, the situation where the tapering method performs the best is when the true 
covariance range is short compared to the distance between the measurements. However, also for the case when 
the true range is small, the total time it takes to calculate the tapering prediction is larger than the time it takes to 
calculate the S 1 prediction unless the nu mber of krig i ng loc ations is small. 

In this work, the taper functions that Furrer et al. ( 20061) found to be best for each value of v are used, but the 
results may be improved by using other taper functions. Changing the taper function will, however, not change 
the fact that the prediction for all locations that do not have any measurements closer than the tapering range is 
zero in the tapering method and that the tapered covariance function is very different from the true covariance 
function if the tapering range is short compared to the true range. Finally, the results for all methods could 
be improved by finding optimal parameters for the approxi mate models instea d of using the parameters for the 
true Matern covariance. For the tapering method, however, iFurrer et all (120061) found that this only changed the 
relative accuracy by one or two percent. 
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5 Conclusions 



Because of the increasing number of large environmental data sets, there is a need for computationally efficient 
statistical models. To be useful for a broad range of practical applications, the models should contain a wide 
family of stationary covariance functions, and be extendable to nonstationary covariance structures, while still 
allowing efficient calculations for large problems. 

The SPDE formulation of the Matern family o f covariance functions has these properties, as it ca n be extended 
to more general nonstationary spatial models (see lBolin and Lindgrenll201lLlLindgren et al.Ll201ll . for details on 
how this can be done), and allows for efficient and accurate Markov model representations. In addition, as shown 
by the simulation comparisons, these Markov methods are more efficient and accurate than both the process 
convolution approach and the covariance tapering method for approximating Matern fields. 

Depending on the context in which a model is used, different aspects are important to make it computationally 
efficient. If, for example, the model is used in MCMC simulations, one should be able to generate samples from 
the model given the parameters efficiently, or if the parameters are estimated in a numerical maximum likelihood 
procedure, one must be able to evaluate the likelihood efficiently. To limit the scope of this article, only the 
computational aspects of kriging was considered. However, for practical applications, parameter estimation is 
likely the most computationally demanding part of the analysis. If maximum likelihood estimation is performed 
using numerical optimization of the likelihood, matrix inverses similar to the one in Step 2 in Table [Qhave to be 
performed in each iteration of the optimization, and it is therefore important that these inverses can be calculated 
efficiently. We have not discussed estimation here, but the Markov methods are likely most efficient in this 
situation as well because these do not require costly Bessel function evaluations when calculating the likelihood. 
However, this is left for future research to investigate in more detail. An introdu ction to maximum l i keliho od 
estimation using the SPDE formulation can be found in Bolin and Lindsren and Lindsren et al. (2011). 

Finally, some relevant methods, such as ICressie and Johannessonl ([2008) and lBanerjee et all (12008b . was not 
included in the comparison in order to keep it relatively short and also because they are difficult to compare with 
the methods discussed here. It would be interesting to include more methods in the comparison, but we leave this 
for future work. 
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